Goto

Collaborating Authors

 Educational Software


MasterClass is 50% off today. It's worth it just for the entertainment

PCWorld

When you purchase through links in our articles, we may earn a small commission. MasterClass is 50% off today. Until May 10th, MasterClass annual plans start at $60/year. It's great for casual learners who want high-quality, entertaining courses from big names. With the job market being what it is, there's never been a better time to learn new skills (or brush up on old ones).


A single algorithm for both restless and rested rotting bandits

Seznec, Julien, Ménard, Pierre, Lazaric, Alessandro, Valko, Michal

arXiv.org Machine Learning

In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes outdated). These two situations can be modeled as specific instances of the rested and restless bandit settings, where arms are rotting (i.e., their value decrease over time). These problems were thought to be significantly different, since Levine et al. (2017) showed that state-of-the-art algorithms for restless bandit perform poorly in the rested rotting setting. In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation). This is in striking contrast with previous negative results showing that no algorithm can achieve similar results as soon as rewards are allowed to increase. We confirm our theoretical findings on a number of synthetic and dataset-based experiments.



Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

Neural Information Processing Systems

Unsupervised Environment Design (UED) formalizes the problem of autocur-ricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to




SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models

Neural Information Processing Systems

Large language models (LLMs) are considered a crucial technology for advancing intelligent education since they exhibit the potential for an in-depth understanding of teaching scenarios and providing students with personalized guidance. Nonetheless, current LLM-based application in personalized teaching predominantly follows a "Question-Answering" paradigm, where students are passively provided with answers and explanations. In this paper, we propose SocraticLM, which achieves a Socratic "Thought-Provoking" teaching paradigm that fulfills the role of a real classroom teacher in actively engaging students in the thought


Scalable Early Childhood Reading Performance Prediction

Neural Information Processing Systems

Currently, students are identified as needing additional educational support using a'wait-to-fail' approach, i.e., waiting until a child has not made expected gains in reading before there is a reevaluation of their instructional needs.